AITopics | streaming data

Collaborating Authors

streaming data

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Neural Information Processing SystemsDec-24-2025, 18:51:13 GMT

We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches thereby providing a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-\epsilon})$ for any $\epsilon> 0$, respectively under the availability of two-sample and one-sample oracles respectively. Importantly, under the availability of the two-sample oracle, the aforementioned rate is actually agnostic to the relationship between confounder and the instrumental variable demonstrating the flexibility of the proposed approach in alleviating the need for explicit model assumptions required in recent works based on reformulating the problem as min-max optimization problems. Experimental validation is provided to demonstrate the advantages of the proposed algorithms over classical approaches like the 2SLS method.

artificial intelligence, machine learning, stochastic optimization algorithm, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Communications > Networks (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Learning Mixture of Gaussians with Streaming Data

Neural Information Processing SystemsNov-21-2025, 16:16:15 GMT

In this paper, we study the problem of learning a mixture of Gaussians with streaming data: given a stream of $N$ points in $d$ dimensions generated by an unknown mixture of $k$ spherical Gaussians, the goal is to estimate the model parameters using a single pass over the data stream. We analyze a streaming version of the popular Lloyd's heuristic and show that the algorithm estimates all the unknown centers of the component Gaussians accurately if they are sufficiently separated. Assuming each pair of centers are $C\sigma$ distant with $C=\Omega((k\log k)^{1/4}\sigma)$ and where $\sigma^2$ is the maximum variance of any Gaussian component, we show that asymptotically the algorithm estimates the centers optimally (up to certain constants); our center separation requirement matches the best known result for spherical Gaussians \citep{vempalawang}. For finite samples, we show that a bias term based on the initial estimate decreases at $O(1/{\rm poly}(N))$ rate while variance decreases at nearly optimal rate of $\sigma^2 d/N$. Our analysis requires seeding the algorithm with a good initial estimate of the true cluster centers for which we provide an online PCA based clustering algorithm. Indeed, the asymptotic per-step time complexity of our algorithm is the optimal $d\cdot k$ while space complexity of our algorithm is $O(dk\log k)$. In addition to the bias and variance terms which tend to $0$, the hard-thresholding based updates of streaming Lloyd's algorithm is agnostic to the data distribution and hence incurs an \emph{approximation error} that cannot be avoided. However, by using a streaming version of the classical \emph{(soft-thresholding-based)} EM method that exploits the Gaussian distribution explicitly, we show that for a mixture of two Gaussians the true means can be estimated consistently, with estimation error decreasing at nearly optimal rate, and tending to $0$ for $N\rightarrow \infty$.

algorithm, gaussian, learning mixture, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.74)
Information Technology > Data Science > Data Mining (0.58)

Add feedback

Variance-Reduced Stochastic Gradient Descent on Streaming Data

Neural Information Processing SystemsNov-20-2025, 23:01:19 GMT

We present an algorithm STRSAGA for efficiently maintaining a machine learning model over data points that arrive over time, quickly updating the model as new training data is observed. We present a competitive analysis comparing the sub-optimality of the model maintained by STRSAGA with that of an offline algorithm that is given the entire data beforehand, and analyze the risk-competitiveness of STRSAGA under different arrival patterns. Our theoretical and experimental results show that the risk of STRSAGA is comparable to that of offline algorithms on a variety of input arrival patterns, and its experimental performance is significantly better than prior algorithms suited for streaming data, such as SGD and SSVRG.

name change, streaming data, variance-reduced stochastic gradient descent, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Coresets for k-Segmentation of Streaming Data

Neural Information Processing SystemsSep-30-2025, 09:33:49 GMT

Life-logging video streams, financial time series, and Twitter tweets are a few examples of high-dimensional signals over practically unbounded time. We consider the problem of computing optimal segmentation of such signals by k-piecewise linear function, using only one pass over the data by maintaining a coreset for the signal. The coreset enables fast further analysis such as automatic summarization and analysis of such signals. A coreset (core-set) is a compact representation of the data seen so far, which approximates the data well for a specific task -- in our case, segmentation of the stream. We show that, perhaps surprisingly, the segmentation problem admits coresets of cardinality only linear in the number of segments k, independently of both the dimension d of the signal, and its number n of points. More precisely, we construct a representation of size O(klog n /eps^2) that provides a (1+eps)-approximation for the sum of squared distances to any given k-piecewise linear function. Moreover, such coresets can be constructed in a parallel streaming approach. Our results rely on a novel eduction of statistical estimations to problems in computational geometry. We empirically evaluate our algorithms on very large synthetic and real data sets from GPS, video and financial domains, using 255 machines in Amazon cloud.

coreset, name change, streaming data, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Communications (0.39)
Information Technology > Artificial Intelligence (0.39)

Add feedback

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Neural Information Processing SystemsMay-26-2025, 20:19:27 GMT

We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches thereby providing a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order \mathcal{O}(\log T/T) and \mathcal{O}(1/T {1-\epsilon}) for any \epsilon 0, respectively under the availability of two-sample and one-sample oracles respectively. Importantly, under the availability of the two-sample oracle, the aforementioned rate is actually agnostic to the relationship between confounder and the instrumental variable demonstrating the flexibility of the proposed approach in alleviating the need for explicit model assumptions required in recent works based on reformulating the problem as min-max optimization problems. Experimental validation is provided to demonstrate the advantages of the proposed algorithms over classical approaches like the 2SLS method.

artificial intelligence, machine learning, stochastic optimization algorithm, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Communications > Networks (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.65)

Add feedback

Reviews: Learning Mixture of Gaussians with Streaming Data

Neural Information Processing SystemsOct-8-2024, 12:49:47 GMT

This paper proposes a LLoyd type method with PCA initialization to estimate means of Gaussians in a streaming setting. References seem to be severely lacking as the domain is wide. The key point of the algorithm seems to be the initialization and there is no discussion on this part (comparison with the literature on k-means initialization). While the technical results might be interesting I have difficulties commenting on them without the proper background on the subject. I find as well that the technical exposition of proofs is not very clear (notations, chaining of arguments).

gaussian, learning mixture, streaming data, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)
Information Technology > Communications > Networks (0.40)

Add feedback

Reviews: Variance-Reduced Stochastic Gradient Descent on Streaming Data

Neural Information Processing SystemsOct-8-2024, 04:33:32 GMT

This paper considers the problem of streaming stochastic optimization taking into account the arrival patterns of examples in time. Whereas the relevant previous work focuses on learning from a stream (i.e. in one pass over the data, with O(dimension) memory), this work attempts to utilize the time spent between the arrival of examples in order to revisit previously encountered examples. The problem description is novel and appealing, both in its possible practical relevance and as an interesting new model for consideration in algorithm analysis. However, there are central issues with the comparisons that the paper draws between algorithms, both conceptually and experimentally. Conceptually, it seems incorrect to say that STRSAGA is a streaming algorithm, and in turn in to repeatedly compare it to one (such as SSVRG), since STRSAGA's memory complexity actually grows linearly with the dataset size (in order to maintain its "effective sample set").

algorithm, strsaga, variance-reduced stochastic gradient descent, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)

Add feedback

Learning from Streaming Data when Users Choose

Su, Jinyan, Dean, Sarah

arXiv.org Artificial IntelligenceJun-3-2024

Moreover, due to the data-driven nature of digital platforms, interesting dynamics emerge among users and service In digital markets comprised of many competing providers: on the one hand, users choose amongst services, each user chooses between multiple providers based on the quality of their services; on the other service providers according to their preferences, hand, providers use the user data to improve and update and the chosen service makes use of the user data their services, affecting future user choices (Ginart et al., to incrementally improve its model. The service 2021; Kwon et al., 2022; Dean et al., 2024; Jagadeesan et al., providers' models influence which service the 2023a). For example, in personalized music streaming platform, user will choose at the next time step, and the a user chooses amongst different music streaming user's choice, in return, influences the model update, platforms based on how well they meet the user's needs.

algorithm, learning, streaming data, (14 more...)

arXiv.org Artificial Intelligence

2406.01481

Country:

Europe > Austria > Vienna (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications (0.83)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

T-SaS: Toward Shift-aware Dynamic Adaptation for Streaming Data

Ren, Weijieying, Zhao, Tianxiang, Qin, Wei, Liu, Kunpeng

arXiv.org Artificial IntelligenceSep-5-2023

In many real-world scenarios, distribution shifts exist in the streaming data across time steps. Many complex sequential data can be effectively divided into distinct regimes that exhibit persistent dynamics. Discovering the shifted behaviors and the evolving patterns underlying the streaming data are important to understand the dynamic system. Existing methods typically train one robust model to work for the evolving data of distinct distributions or sequentially adapt the model utilizing explicitly given regime boundaries. However, there are two challenges: (1) shifts in data streams could happen drastically and abruptly without precursors. Boundaries of distribution shifts are usually unavailable, and (2) training a shared model for all domains could fail to capture varying patterns. This paper aims to solve the problem of sequential data modeling in the presence of sudden distribution shifts that occur without any precursors. Specifically, we design a Bayesian framework, dubbed as T-SaS, with a discrete distribution-modeling variable to capture abrupt shifts of data. Then, we design a model that enable adaptation with dynamic network selection conditioned on that discrete variable. The proposed method learns specific model parameters for each distribution by learning which neurons should be activated in the full network. A dynamic masking strategy is adopted here to support inter-distribution transfer through the overlapping of a set of sparse networks. Extensive experiments show that our proposed method is superior in both accurately detecting shift boundaries to get segments of varying distributions and effectively adapting to downstream forecast or classification tasks.

shift-aware dynamic adaptation, streaming data, t-sas

arXiv.org Artificial Intelligence

2309.0261

Genre: Research Report (0.40)

Technology:

Information Technology > Communications > Networks (0.80)
Information Technology > Artificial Intelligence > Machine Learning (0.53)

Add feedback

Autoencoder-based Anomaly Detection in Streaming Data with Incremental Learning and Concept Drift Adaptation

Li, Jin, Malialis, Kleanthis, Polycarpou, Marios M.

arXiv.org Artificial IntelligenceSep-5-2023

In our digital universe nowadays, enormous amount of data are produced in a streaming manner in a variety of application areas. These data are often unlabelled. In this case, identifying infrequent events, such as anomalies, poses a great challenge. This problem becomes even more difficult in non-stationary environments, which can cause deterioration of the predictive performance of a model. To address the above challenges, the paper proposes an autoencoder-based incremental learning method with drift detection (strAEm++DD). Our proposed method strAEm++DD leverages on the advantages of both incremental learning and drift detection. We conduct an experimental study using real-world and synthetic datasets with severe or extreme class imbalance, and provide an empirical analysis of strAEm++DD. We further conduct a comparative study, showing that the proposed method significantly outperforms existing baseline and advanced methods.

autoencoder-based anomaly detection, incremental learning, learning and concept drift adaptation, (1 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IJCNN54540.2023.10191328

2305.08977

Genre: Research Report (0.89)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.60)

Add feedback